DeepMHCII: a novel binding core-aware deep interaction model for accurate MHC-II peptide binding affinity prediction

Abstract Motivation Computationally predicting major histocompatibility complex (MHC)-peptide binding affinity is an important problem in immunological bioinformatics. Recent cutting-edge deep learning-based methods for this problem are unable to achieve satisfactory performance for MHC class II molecules. This is because such methods generate the input by simply concatenating the two given sequences: (the estimated binding core of) a peptide and (the pseudo sequence of) an MHC class II molecule, ignoring biological knowledge behind the interactions of the two molecules. We thus propose a binding core-aware deep learning-based model, DeepMHCII, with a binding interaction convolution layer, which allows to integrate all potential binding cores (in a given peptide) with the MHC pseudo (binding) sequence, through modeling the interaction with multiple convolutional kernels. Results Extensive empirical experiments with four large-scale datasets demonstrate that DeepMHCII significantly outperformed four state-of-the-art methods under numerous settings, such as 5-fold cross-validation, leave one molecule out, validation with independent testing sets and binding core prediction. All these results and visualization of the predicted binding cores indicate the effectiveness of our model, DeepMHCII, and the importance of properly modeling biological facts in deep learning for high predictive performance and efficient knowledge discovery. Availability and implementation DeepMHCII is publicly available at https://github.com/yourh/DeepMHCII. Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
Major histocompatibility complex (MHC) molecules play a significant role in the T-cell mediated adaptive immune response (Janeway et al., 2005). MHC molecules first bind peptide fragments derived from pathogens and then present the peptides to the surface of antigen-presenting cells (APC). After the MHC-peptide complexes are recognized by T-cell receptors (TCR), adaptive immune response will be stimulated to fight against and eliminate invading pathogens. Accurate identification of MHC binding peptides is thus crucial for not only elucidating the mechanism of immune recognition but also facilitating the design of peptide-based vaccine and cancer immunotherapy (Blass and Ott, 2021). As biochemical experiments are time consuming and labor intensive, computational approaches for predicting MHC binding peptides have become increasingly important and have been utilized to prioritize a small number of promising candidates for further verification by biochemical experiments (Hu et al., 2010;Lund et al., 2005;Mamitsuka, 1998;Udaka et al., 2002;Zhu et al., 2006).
There are two major classes of MHC molecules: MHC Class I (MHC-I) and MHC Class II (MHC-II). MHC-I molecules have one chain (a) and MHC-II molecules have two chains (a and b). Human MHC-II molecules are encoded in the human leucocyte antigen (HLA) gene complex involving three types of molecules: DP, DQ, DR, while mouse MHC-II are encoded in the histocompatibility 2 (H-2) (Traherne, 2008). MHC-I and MHC-II molecules play different roles in adaptive immune response. MHC-I molecules bind a rather fixed length of short peptides (usually 8-11 amino acids) from endogenous antigen, and these peptides are presented to cytotoxic T lymphocytes. In contrast, MHC-II molecules bind a more diverse length of peptides from exogenous antigen, and these peptides are presented to helper T lymphocytes. It has also been reported that i220 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Bioinformatics, 38, 2022, i220-i228 https://doi.org/10.1093/bioinformatics/btac225 ISCB/ISMB 2022 MHC-II peptide binding involves the B-cell mediated adaptive immune response. From these aspects, predicting peptides binding to MHC-II is more challenging than that of MHC-I (Nielsen et al., 2010). Currently, the state-of-the-art methods for MHC-I peptide binding prediction can achieve an area under the ROC curve (AUC) of around 0.9, while AUC by the prediction methods for MHC-II is usually unable to reach 0.8, particularly for MHC-II molecules with very few or no binding peptide data (Jensen et al., 2018;Zhang et al., 2012b), which is far from satisfactory. Moreover, quantitative prediction of MHC peptide binding is more useful in practice than binary classification for selecting a small number of promising candidate peptides. It is thus imperative to develop accurate computational methods for MHC-II peptide binding affinity prediction.
The challenges come from two sides: peptides and MHC-II molecules. For the peptide side, the binding groove of MHC-II molecules is open at both ends, which causes large variation on the length of binding peptides, ranging from 10 to 30 amino acids (typically 12-16). The binding groove of MHC-II has nine pockets, where one amino acid residue of the binding core of a binding peptide fits to one pocket normally (Janeway et al., 2005). The peptide-MHC binding affinity is primarily determined by the interaction between MHC-II molecules and the binding core of peptides. However, it has been found that peptides flanking regions (PFRs) which are outside of the binding core also affect the binding affinity (Arnold et al., 2002;Holland et al., 2013). Thus, there are two issues for the peptide side: (i) the flexibility in length and (ii) the location of the binding core. For the MHC-II molecule side, MHC-II molecules are highly polymorphic. There are thousands of MHC-II molecules, and each MHC-II molecule has its own binding specificity. Also, MHC molecules are proteins to be represented by amino acid sequences with longer and more diverse lengths than peptides. In addition, currently, only dozens of MHC-II molecules have hundreds of binding affinity data in the immune epitope database and analysis resource (IEDB) (Vita et al., 2019), and a vast majority of MHC-II molecules have very few or even no binding data. Thus, there are three issues for the MHC-II side: (i) thousands of MHC-II molecules with different binding specificity, (ii) long and size-flexible sequences and (iii) data scarcity for most of the MHC-II molecules. These aspects have made it difficult to model the interaction of MHC-II peptide binding accurately.
Computational methods for MHC-II peptide binding prediction can be divided into two categories: allele-specific and pan-specific (Zhang et al., 2012b). Allele-specific methods can predict only binding preferences of MHC-II molecules in the training set, while pan-specific methods can predict binding preferences of MHC-II molecules even with no training data of these molecules, which is thus the focus of our work. Traditionally, pan-specific methods have been developed by various different techniques, such as position-specific scoring matrices (Zhang et al., 2012a), artificial neural network (ANN) (Jensen et al., 2018), kernel-based methods (Guo et al., 2013) and ensemble learning (Xu et al., 2016). The most established method is NetMHCIIpan [latest version for MHC-II peptide binding prediction is NetMHCIIpan-3.2 (Jensen et al., 2018)], an ANNbased method, which pioneered using pseudo sequence to represent an MHC-II molecule. ANN deals with only a fixed-sized input, and so NetMHCIIpan first estimates the binding core in a given peptide and then trains an ANN using the estimated binding cores and the pseudo sequences. The whole process is repeated until convergence. However, the first estimation of the binding core might be inaccurate, which affects the predictive performance heavily. Most recent pan-specific methods use deep learning (DL), such as convolutional neural network (CNN), long short-term memory and transformer to learn the interaction between MHC-II molecules and peptides. There exist four representative DL-based methods: PUFFIN (Zeng and Gifford, 2019), DeepSeqPanII (Liu et al., 2021), MHCAttnNet (Venkatesh et al., 2020) and BERTMHC (Cheng et al., 2021). In spite of using advanced DL techniques, these methods concatenate the sequence encoding of an MHC-II molecule and a peptide for the input, which do not take advantage of important domain knowledge, resulting in the lacking of performance improvement and interpretability.
We propose a novel deep learning-based method, DeepMHCII, for accurate MHC-II peptide binding affinity prediction by incorporating biological knowledge into designing the model architecture. Specifically, DeepMHCII is modeled, considering the following three distinct features: (i) binding core and PFRs in each peptide; (ii) pseudo sequence, i.e. the sequence with only crucial residues for directly interacting with the binding core of the counterpart peptide, in each MHC-II molecule. (iii) Interaction between a peptide and an MHC-II molecule by the interaction between all potential binding cores and the pseudo sequence. Note that these three features have not been explicitly addressed by any existing methods simultaneously. Specifically, DeepMHCII generates a binding interaction convolutional layer (BICL) with adaptive kernel size filters to model the interaction between peptides and MHC-II molecules.
The performance of DeepMHCII has been thoroughly validated by extensive experiments on four benchmark datasets under various settings, such as 5-fold cross-validation, leave one molecule out (LOMO), independent testing set verification and binding core prediction. We compared the predictive performance of DeepMHCII with four state-of-the-art methods: NetMHCIIpan-3.2 (Jensen et al., 2018), PUFFIN (Zeng and Gifford, 2019), DeepSeqPanII (Liu et al., 2021) and MHCAttnNet (Venkatesh et al., 2020). Experimental results demonstrate that DeepMHCII outperformed all competing methods in all experiments. The improvement was especially significant in LOMO and independent testing set verification. For example, DeepMHCII achieved an average AUC of 0.77 in independent testing, which was 7% and 10% higher than that of NetMHCIIpan-3.2 (0.719) and PUFFIN (0.70). In addition, DeepMHCII achieved an average Pearson correlation coefficient (PCC) of 0.560 in LOMO, which was 6.7% higher than that of PUFFIN (0.525). All these results indicate the effectiveness of DeepMHCII on predicting the binding specificity of unseen MHC-II molecules. We also verified the performance advantage and interpretability of DeepMHCII in identifying the binding core of peptides and binding motifs of MHC-II molecules.

Problem formulation
Suppose P ¼ ðp 1 ; p 2 ; . . . ; p L Þ denotes the peptide sequence and Q ¼ ðq 1 ; q 2 ; . . . ; q L 0 Þ denotes the L 0 -length MHC-II molecule sequence, where each of p i and q j stands for one of the 20 types of amino acids. The task is a regression problem to predict the binding affinityẑ 2 ½0; 1 by a given pair of P and Q. The binding affinity is mainly determined by the nine-length binding core (unknown) of the peptide and the binding groove with nine pockets in the MHC-II molecule. In addition, it has been found that peptides flanking regions (PFRs) of the binding core affect the binding affinity.
In practice, we use the 34-length pseudo sequence Q 0 extracted from Q as the representation of MHC-II molecules. Pseudo sequence of MHC-II refers to the important MHC residues that are considered being crucial for peptide binding. The 34-length MHC-II molecule pseudo sequence was first proposed in NetMHCIIpan-3.0 (Karosiene et al., 2013), which was composed of 15 and 19 amino acid residues from a and b chains of MHC-II, respectively. These residues were extracted from MHC-II peptide complexes in PDB (Burley et al., 2021), which were polymorphic in MHC molecules and found in close contact (<4 Å ) with peptide binding core in at least one MHC-II peptide complex. The 34-length MHC-II pseudo sequence has also been used in NetMHCIIpan-3.1 (Andreatta et al., 2015) and 3.2 (Jensen et al., 2018). Similarly, we use this pseudo sequence as the representation of MHC-II molecules.

Overview
The basic idea of DeepMHCII is (i) to use deep learning to explicitly model the interaction between an MHC-II molecule sequence and a peptide and (ii) considering the crucial residue for binding, to focus on only the interaction between the binding core in a peptide and counterpart important amino acid residues (pseudo sequence) in a MHC-II molecule. Specifically, the input of DeepMHCII is a L-length peptide sequence P and, a 34-length pseudo sequence Q 0 extracted from an MHC-II molecule sequence Q (Burley et al., 2021;Jensen et al., 2018). Figure 1 shows the architecture of DeepMHCII with mainly four steps: (i) we apply an embedding layer to the peptide sequence and also another embedding layer to the pseudo sequence of MHC-II to obtain deep semantic dense representations; (ii) we use a binding interaction convolutional layer (BICL) to obtain the representation of binding interaction between the potential binding cores and the MHC-II molecule; (iii) we use fully connected layers and a max-pooling layer to extract the interaction of peptide and MHC-II molecule; and (iv) we use an output layer to obtain the predicted binding affinity.

Input layer
We use an embedding layer to encode amino acid sequences for peptides and also similarly another embedding layer for MHC-II pseudo sequences. Let L be the length of an input peptide and d be the dimension of amino acid embeddings.
Then, for a given pair of a peptide sequence P and an MHC-II molecule pseudo sequence Q 0 ; X 2 R LÂd , the output of the embedding layer for P, and Y 2 R 34Âd , the output of the embedding layer for Q 0 , are given as follows: where x i 2 R d and y j 2 R d are the representation of the ith amino acid of the peptide sequence and the jth amino acid of the MHC-II pseudo sequence, respectively.

Binding interaction convolutional layer
We use a binding interaction convolutional layer (BICL) to obtain the representation of the interaction between peptide X and MHC-II molecule Y by considering all possible binding cores of X. Traditionally in sequence-based CNN, input sequences share the same kernels (filters), while in our problem, each MHC-II molecule has a distinguished binding preference. To address this issue, BICL generates different kernels for each MHC-II molecule. Specifically, letting M be the size of a binding core (¼9 in our problem), we use a weight matrix W k with the size of M Â 34 to generate the kth kernel as f ðW k YÞ, where f is the activation function. We can write a potential binding core of X starting with the ith residue as X i:iþMÀ1 . By using X i:iþMÀ1 and kernel f ðW k YÞ, the interaction between potential binding core X i:iþMÀ1 and the MHC-II molecule Y can be given as follows: where b k is the bias. Then the output of BICL can be written as C ð0Þ 2 R ðLÀMþ1ÞÂh ð0Þ , where h ð0Þ is the number of kernels.
Considering the effect of both binding core and PFRs, we used four different kernel sizes: 9, 11, 13 and 15. For example, kernel size of 15 is used to consider the effect of three more amino acids in the left and right side of binding core, respectively. For each kernel size, we used a different number of kernels. That is, the number of kernels for the kernel size of nine was the largest, followed by those of 11, 13 and 15. This setting will be described in detail in Section 3.2. Note that the padding symbol will be added to both side of X for kernel sizes other than 9 to get the same number of rows (number of potential binding cores) of output.

Max-pooling and output layer
We then use N fully connected layers and a max-pooling layer to obtain the representation g 2 R h ðNÞ of the interaction between peptide X and MHC-II molecule Y as follows: where 1 n N, and W ðnÞ 2 R h ðnÀ1Þ Âh ðnÞ ; b ðnÞ 2 R h ðnÞ and C ðnÞ 2 R ðLÀMþ1ÞÂh ðnÞ are the weights, bias and output of the nth fully connected layer, respectively.
Finally, we use the output layer to obtain the predicted binding affinityẑ 2 ½0; 1 as follows: where w ðoÞ 2 R h ðNÞ and b ðoÞ 2 R are the weights and bias, respectively. Our training objective is to minimize the mean square error. Practically, we trained T models with different random initial weights and used the average over the T predicted scores as the final prediction.

Binding core prediction
For a given pair of peptide sequence and MHC-II molecule, we remove max-pooling layer of DeepMHCII, use the output layer after N fully connected layers directly to obtain the predicted score of each nine-length potential binding core as follows: The nine-length subsequence of peptide with the highest score (ẑ k ) will be recommended as the binding core of this pair.

i222
R.You et al.

Datasets
We used four publicly available benchmark datasets (BD2016, ID2017, BD2020 and BC2015) to train and evaluate DeepMHCII and competing methods: BD2016 and ID2017 for MHC-peptide binding affinities, BD2020 for MHC-peptide binding classification and BC2015 for predicting binding cores. Below, we describe more details on each of the four datasets. BD2016 (http://www.cbs.dtu.dk/suppl/immunology/NetMHCIIpan-3.2): BD2016 contains 134 281 data points of MHC-peptide binding affinities over 80 different MHC-II molecules, including 36 HLA-DR, 27 HLA-DQ, 9 HLA-DP and 8 H-2 molecules. BD2016 was compiled for training NetMHCIIpan-3.2 (Jensen et al., 2018) from IEDB. The original, experimentally obtained IC50 binding value of each data point was transformed into the binding affinity with the range of [0,1] by 1 À logðIC 50 nMÞ= logð50; 000Þ. BD2016 already provides a 5-fold cross-validation (5-fold CV) split which groups the peptides with common motifs into the same fold. Table 1 shows a summary of BD2016.
ID2017: a dataset of MHC-peptide binding affinities was compiled from IEDB in 2017 for evaluating different MHC-II binding peptide prediction methods (Andreatta et al., 2018). From this dataset, we generated an independent test dataset, ID2017, by removing data points overlapped with BD2016 and retaining MHC-II molecules with more than 50 peptides for robust performance evaluation. There are 10 HLA-DB molecules with 857 peptides in ID2017.
BD2020: a binary classification dataset of MHC-II peptide binding, which was extracted by Venkatesh et al. (2020) from IEDB for training MHCAttnNet. Note that BD2020 has no quantitative binding affinities. BD2020 consists of 65 954 data points for 49 HLA-DRB molecules, where 36 035 are positive, 28 919 are negative and 5-fold CV split is also provided.
BC2015: a binding core benchmark, which was used to evaluate the performance of NetMHCIIpan-3.2 in identifying the binding core of an MHC-peptide complex. BC2015 consists of 51 complexes from PDB.
We have calculated the following two probabilities to evaluate the redundancy of the 5-fold CV split of two benchmark datasets, BD2016 (redundancy-reduced partition) and BD2020 (random partition). Outer-p is the probability that two peptides in different folds have a common 9-mer subsequence (we focused on 9-mer, due to the binding core length), while Inner-p is the probability that two peptides in the same fold have a common 9-mer subsequence. As shown in Table 2, we can clearly see that BD2016 is less redundant, since Outer-p is only an around two percent of Inner-P for BD2016 while Outer-p and Inner-p have the same value for BD2020.

Experimental settings
DeepMHCII used the following hyperparameter values, which were selected by 5-fold CV over BD2016: d (dimension of embeddings of amino acids) ¼ 16. The numbers of kernels of BICL with the kernel sizes of 9, 11, 13 and 15 were 256, 128, 64 and 64, respectively. N (number of fully connected layers) ¼ 2 and the sizes of nodes at the two layers were 256 and 128. f (activation function) was ReLU. We used batch normalization (Ioffe and Szegedy, 2015) after BICL and each of the fully connected layers. Also, we used dropout (Srivastava et al., 2014) with the drop rate of 0.25 to avoid overfitting. During the training process, the batch size was 128, the number of epochs was 20 and the optimizer we used was Adadelta (Zeiler, 2012) with the learning rate of 0.9 and weight decay of 1eÀ4. T (number of trained models) was 20. We implemented DeepMHCII by PyTorch (Paszke et al., 2019).
MHCAttnNet used the cross-entropy objective function and binary MHC-peptide binding dataset (BD2020) for model training. We then trained DeepMHCII with the same dataset and the cross-entropy objective function and compared the performance of MHCAttnNet obtained from the paper directly. Both DeepMHCII and MHCAttnNet used only one single model without any ensemble.
BERTMHC (Cheng et al., 2021) also can be a competing method, while the open-source implementation (https://github.com/ s6juncheng/BERTMHC) of BERTMHC was not consistent with the description in Cheng et al. (2021). We have then contacted the authors of BERTMHC, regarding this matter, but we were unable to receive any response (https://github.com/s6juncheng/BERTMHC/ issues/8). As a result, we did not use BERTMHC in the performance comparison.

Evaluation metrics
We set up binary classification: we used the area under the receiver operating characteristics curve (AUC) for each MHC-II molecule and reported the average AUC. Also to classify peptides into binders and non-binders, a binding threshold of 500 nM was used: All peptides with an IC 50 binding value < 500 nM (0.426 after transformation) were classified as binders. Since there are a large number of MHC molecules, we used a binomial test to check the statistical significance of performance difference (significance level was P-value < 0.05). In addition, we used the Pearson correlation coefficient (PCC) to examine the linear relationship between the predicted binding affinity and the true value.

Experimental results
We conducted the following four experiments to validate the predictive performance of DeepMHCII: (i) we examined the performance of DeepMHCII and the competing methods by 5-fold CV over BD2016. (ii) In order to validate the performance for unseen MHC-II molecules, following NetMHCIIpan-3.2, we conducted LOMO over BD2016 by using the above 5-fold CV set-up. Specifically, each time, data points of only one MHC-II molecule in the test fold were used for testing, while data points of all other MHC-II molecules in training folds were used for training over 5-fold CV settings. Furthermore, following the settings in NetMHCIIpan-3.2, out of all 81 MHC-II molecules, we focused on 61 molecules with more than 40 data points and at least three binders for the robustness of performance evaluation. (iii) We performed a performance comparison on DeepMHCII and competing methods on ID2017, i.e. the independent test set. (iv) We examined the performance of DeepMHCII using the same objective function as that of MHCAttnNet, to examine the robustness of DeepMHCII.  Table S2). The results are consistent with those of 5fold CV. DeepMHCII achieved the highest AUC and PCC of 0.785 and 0.560, respectively, which were 1.3% and 2.9%, respectively, higher than those achieved by NetMHCIIpan-3.2, the second-best method. Figure 2 plots the LOMO AUC results obtained by: DeepMHCII for y-axis and (Fig. 2a) NetMHCIIpan-3.2, (Fig. 2b) PUFFIN and (Fig. 2c) DeepSeqPanII for x-axis, where each dot corresponds to one MHC-II molecule. That is, if one dot appears above the diagonal line, DeepMHCII outperformed the competing method, regarding the MHC-II molecule corresponding to this dot. DeepMHCII outperformed NetMCHIIpan-3.2, PUFFIN and DeepSeqPanII on 43, 42, 51 out of all 61 molecules, respectively, all being statistically significant (two-tailed binomial test, P-value ¼ 1:87 Â 10 À3 ; 4:44 Â 10 À3 and 9:62 Â 10 À8 , respectively). We also found that all methods tend to perform poorly on MHC molecules of H-2. In particular, there are several molecules such as H-2-IAs and H-2-IEk where DeepMHCII performs poorly. These may be related to the lack of similar MHC molecules, the relatively small amount of test data and the unknown quality of data. On the whole, these results indicate that DeepMHCII is more robust and can deal with unknown MHC-II alleles better than the competing methods. Furthermore, we have reported the performance of DeepMHCII and PUFFIN over BD2016 under a more strict LOMO, where for each test MHC molecule, we removed the most similar molecules from the training sets to avoid 'easy' predictions. Detailed results are presented in the Supplementary material. We can see that the performance of DeepMHCII is still much better than PUFFIN. We did not use NetMHCIIpan-3.2 and DeepSeqPanII in this experiment, since no source codes of NetMHCIIpan-3.2 are provided and the performance of DeepSeqPanII is slower and worse than the other three methods. Table 4 reports the performance on each MHC-II molecule of ID2017, the independent testing set, by DeepMHCII and competing methods. Figure 3 shows the ROC curves of DeepMHCII and competing methods on the whole ID2017. DeepMHCII outperformed all competing methods on both AUC of the whole ID2017 in Figure 3 and average AUC over all MHC-II molecules in Table 4. Specifically, DeepMHCII achieved the best average AUC of 0.770, which was 7.1% higher than the second-best method, NetMHCIIpan-3.2 (0.719), and more than 10% higher than the other two competing methods, PUFFIN and DeepSeqPanII. Regarding the AUC of the whole testing set, DeepMHCII achieved the highest AUC of 0.775, which was 7.8%, 12.0% and 15.5% higher than NetMHCIIpan-3.2 (0.719), PUFFIN (0.692) and DeepSeqPanII (0.671), respectively. For the performance over each MHC-II molecule, DeepMHCII outperformed NetMCHIIpan-3.2 and PUFFIN in all 10 MHC-II molecules and DeepSeqPanII in 8 out of the ten MHC-II molecules.

Comparison with MHCAttnNet
We trained DeepMHCII with the cross-entropy objective function under the same 5-fold CV split over BD2020, where this objective function was used in MHCAttnNet (Venkatesh et al., 2020), one of the competing methods of DeepMHCII. Table 5 shows the performance of DeepMHCII and MHCAttnNet, where following the original paper of MHCAttnNet, we used AUC over the whole test set, instead of the average AUC per MHC-II molecule. The AUC and accuracy of DeepMHCII were 4.1% and 3.0%, respectively, higher than those of MHCAttnNet, further demonstrating the performance advantage of DeepMHCII.

Result analysis
In order to analyze (or to interpret) the experimental results of DeepMHCII, we conducted the following five experiments: (i) We compared DeepMHCII with DeepSeqPanII and NetMHCIIpan-3.2 in binding core prediction over BC2015. (ii) We visualized the binding motifs of MHC-II molecules by using sequence logos (Schneider and Stephens, 1990) and compared the sequence logos generated by DeepMHCII with those of DeepSeqPanII and NetMHCIIpan-3.2. (iii) To illustrate the biological perspective captured by DeepMHCII, we analyzed the weights (of nine pockets in the binding core) of BICL in DeepMHCII. (iv) We examined the performance of DeepMHCII under different kernel sizes (5-15). (v) We checked the time and space complexities of DeepMHCII, comparing with those of two competing methods, PUFFIN and DeepSeqPanII.

Binding core prediction
The last column of Table 3 shows the results of predicting the binding core over BC2015 (more detailed results are shown in Supplementary Table S3). Note that PUFFIN cannot predict binding cores and thus is not shown in this column. Also note that NetMHCIIpan-3.2 has one variant, 'NetMHCIIpan-3.2 (without offset)', which uses the original prediction results to identify the binding core. Out of all 51 pairs, DeepMHCII correctly predicted  (10). This result highlights the advantage of DeepMHCII of flexibly modeling the binding core and its high interpretability over competing methods.

Sequence logos
We visualized the binding motifs of MHC-II molecules obtained by each prediction method as sequence logos (Schneider and Stephens, 1990). Following the description in Nielsen et al. (2007), we first computed the binding scores of 100 000 random peptides from SwissProt and then selected the top 1% predicted binders to draw sequence logos (with default settings). Since PUFFIN does not have the ability to predict the binding core, we compared the sequence logos generated by DeepMHCII, NetMHCIIpan-3.2 and DeepSeqPanII. We focused on four MHC-II molecules, DRB1*04:01, DRB1*09:01, DRB1*12:02 and DRB1*13:01, in ID2017, where DeepMHCII outperformed NetMHCIIpan-3.2 most. Figure 4 shows the sequence logos of these four MHC-II molecules by different three methods. Each sequence logo has 1st to 9th positions (pockets) in the x-axis, where at each position, the total height [of letters (amino acids)] represents the relative information content (also importance) of the corresponding position in the motif, and the height of each letter shows the frequency of the corresponding amino acid in the position. It is widely observed and generally thought that P1 (pocket 1), P4, P6 and P9 are four primary anchors, which are most important for peptide binding (Rammensee et al., 1999). All four sequence logos of DeepMHCII are consistent with this widely accepted understanding. In contrast, DeepSeqPanII could not distinguish these four primary anchors from other five pockets clearly, and the sequence logos by NetMHCIIpan3.2 contained more noise at pockets, especially non-primary pockets in DRB1*12:02 and DRB1*13:01, making frequent amino acids at each pocket totally unclear. Furthermore, by taking a closer look at primary anchors, we could observe clear differences among prediction methods in amino acid preference in primary anchors. For example, P4 of DRB1    SYFPEITHI (Rammensee et al., 1999), an MHC binding motif database, P4 allows amino acid E. This information is consistent with the sequence logo of DeepMHCII. Overall, DeepMHCII could generate better sequence logos than the competing methods.

Analyzing weights of BICL
We examined the importance of each position in the binding core by checking the absolute value of the weight (obtained by BICL) for each position of the binding core. For each pocket, we summed the absolute weight values for each model and computed the mean and standard deviation from all T ¼ 20 models of DeepMHCII. Table 6 shows the mean and standard deviation of each pocket. The values of positions 1, 4, 6 and 9 (which are in boldface) were much larger than other positions, being consistent with that P1 (pocket 1), P4, P6 and P9 are primary anchors (Rammensee et al., 1999). From this result, we can see that DeepMHCII could learn biological knowledge from data well, showing the validity of DeepMHCII from a biological perspective.

Ablation experiments
We examined DeepMHCII with the single kernel size k, selecting k from 5, 7, 9, 11, 13 and 15, where we call DeepMHCII trained by the kernel size of k as DeepMHCII k (Note that original DeepMHCII uses four different kernel sizes: 9, 11, 13 and 15 at once). Table 7 reports the performance of 5-fold CV over BD2016 by DeepMHCII k . We have three findings: (i) DeepMHCII 9 outperformed DeepMHCII 5 and DeepMHCII 7 significantly. Specifically,  DeepMHCII 9 achieved PCC of 0.676, which was followed by DeepMHCII 7 (0.651) and DeepMHCII 5 (0.637). This is consistent with that the standard binding core is with nine amino acids.
(ii) DeepMHCII 11 , DeepMHCII 13 and DeepMHCII 15 achieved a similar performance, which was higher than DeepMHCII 9 . For example, both DeepMHCII 13 and DeepMHCII 15 achieved PCC of 0.686, which was higher than DeepMHCII 9 (0.676). This suggests that peptide franking region has some positive effect on MHC-II peptide binding. (iii) DeepMHCII achieved the best performance with AUC of 0.856 and PCC of 0.691 among all compared methods. All these results confirm that the high performance of DeepMHCII was obtained by incorporating biological knowledge into the model design.
In addition, we compared DeepMHCII with a vanilla CNN, which uses a convolutional layer instead of BICL on the peptide and concatenates representations of the peptide and allele directly. As shown in Table 8, we can see that the performance of DeepMHCII is much better than CNN. Table 9 shows the amount of time necessary for training and prediction of DeepMHCII and competing methods. All methods were run on a single Nvidia Titan X (pascal). The amount of time for training DeepMHCII is equivalent to that for training PUFFIN and is much smaller than that for training DeepSeqPanII, which is based on recurrent neural networks. More notably, for prediction, DeepMHCII is significantly faster than both PUFFIN and DeepSeqPanII. We further examined the size of The trained model between DeepMHCII and competing methods. As shown in the last column of the table, the size of DeepMHCII is only 1.4 Mbytes, which is far smaller than those of PUFFIN (24.1 Mbytes) and DeepSeqPanII (15.3 Mbytes). This is a practically sizable difference.

Conclusion and discussion
We have proposed a new deep learning model, DeepMHCII, for predicting peptide-MHC binding affinity, the binding core and important pockets in the binding core. Considering the biological properties behind peptide-MHC binding, DeepMHCII has explicitly incorporated the interaction processes between a peptide and an MHC-II molecule through interaction convolution layers, which makes us biologically understand the peptide-MHC binding core and predict the binding affinity.
Extensive experiments with four large-scale datasets have demonstrated that DeepMHCII significantly outperformed all four stateof-the-art competing methods under a wide variety of settings. Furthermore, DeepMHCII captured the motifs more precisely than the compared methods, verifying the high performance in predicting the binding core and also the important pockets. All these results proved the usefulness of DeepMHCII in terms of the high accuracy but also precise biological discovery and high scientific interpretability. A limitation of our study might be that we focus on peptide binding prediction other than epitope prediction, where the MHC binding peptide must be recognized by TCR. Possible future work would be to incorporate more biological knowledge into our model design to develop high-performance deep learning methods for epitope prediction (Blass and Ott, 2021).